Apparatus for data compression encoding and decoding

ABSTRACT

A method and apparatus for decoding ordinary run length codes and run length codes that have been extended to include two classes of code words, &#39;&#39;&#39;&#39;regular&#39;&#39;&#39;&#39; code words for runs and &#39;&#39;&#39;&#39;special&#39;&#39;&#39;&#39; code words for selected special situations. The decoder comprises table storage and select/combine circuitry. The table storage holds four small tables whose values can be adjusted to correspond to any ordinary or extended run length code to be implemented. The select/combine circuitry receives successive code word bits, uses successive elements of one table to isolate the bits comprising a code word, and then combines the code word with other table values in order to calculate a binary value uniquely identifying the code word.

United States Patent 1 Van Voorhis APPARATUS FOR DATA COMPRESSION ENCODING AND DECODING Inventor:

Calif Assignee:

Corporation, New York, NY.

Filed:

Dec. 26, 1973 Appl. No: 428,500

David C. Van Voorhis, Los Gatos,

International Business Machines Dec. 9, 1975 ABSTRACT A method and apparatus for decoding ordinary run length codes and run length codes that have been extended to include two classes of code words. regular" code words for runs and special" code words for selected special situations. The decoder comprises table storage and select/combine circuitry. The table stor- [52] s CL 340/347 173 3 age holds four small tables whose values can be ad 51 Int. cl. it H03K 13/24 justed to correspond to y Ordinary or extended [58] Field of Search 3 7 DD, 1725; length code to be implemented. The select/combine I78/DIG' 3 6 circuitry receives successive code word bits, uses successive elements of one table to isolate the bits com- [56] Ref Cit d prising a code word, and then combines the code word UNITED STATES PATENTS with other table values in order to calculate a binary 3 639 NW7] C m 178/6 value uniquely identifying the code word. 3:8l3:485 5/1974 Arps H.,..QI:::III:::jiijj iisfoici 3 2 Claims, 23 Drawing Figures 52* DECODER DECODER 55 S TABLES R TABLES L5 45 t5 b5 rzlte e6 J6 b6 T3 Tl T74 75 7 t i, 1? F 64 i. DECODER 51 ML SELECT/COMBINE 9 96- 82 fire '81 START 9 f7 1? s7 94 95 EVENT REGEN R T RESET E A OR U.S. Patent Dec. 9, 1975 Sheet2of 16 3,925,780

an :25 so:

U.S. Patent Dec. 9, 1975 Sheet4 of 16 3,925,780

FIG.6

E02 EEw ooooco E21 Eiw oooooo US. Patent Dec. 9, 1975 Sheet60fl6 3,925,780

SHIFT LEFT 224 4 DOWN ooum 22s 40 FIG. 8 510 T8 PLUGBOARD DTSi DTSB

DTST

SN BS PLUGBOARD DBSi DBS4

DBS5

DBSS

DBST

FIGJO US. Patent Dec. 9, 1975 Sheet 7 of 16 3,925,780

U.S. Patent Dec. 9, 1975 Sheet9of 16 3,925,780

oooooo :62 Kim oocoo U.S. Patent Dec. 9, 1975 Sheet 10 of 16 3,925,780

560w TR 01001100110 11 I 1 H I 1r111" 1l1s1" 0151"12111 UTRi DTRZ

DTR4

DTRS

DTR6

DTR?

DTR

361v BR PLUBOARD 511111 l|b"2' 0' '11| b"e l REGISTERS 363 0 1 11 101 N 111151 011) U U U J 0 DBRZ:

DBRT:

L6 e6 10 b6 H614 CABLE23 011115 24 ENCODER TABLE 11 Bs Tr Ts 00015 11 01101515 CABLE 1e CABLEM 0500051110015 11 01 B3 Tr Ts U.S. Patent Dec. 9, 1975 Sheet110f16 3,925,780

DTR1

DTRZ

DTR5

DTR4

DTR5

DTRB

DTRT

DTRB

FIG.15

m1) m2) m5) M4) M5) me) brU) DBR1 DBR2

DBR3

DBR4

DBR5

DBRG

DBR?

DBRB

FIG.16

U.S. Patent Dec. 9, 1975 Sheet 12 of 16 3,925,780

oooooo EOE i=5 oooooo U.S. Patent Dec. 9, 1975 Sheet 15 of 16 3,925,780

m Sn an B o o o NNOI man an APPARATUS FOR DATA COMPRESSION ENCODING AND DECODING BACKGROUND OF THE INVENTION The invention relates to data compression techniques for digital images, and more particularly, to a method and apparatus for decoding the ordinary and extended run-length codes required by such techniques.

A digital image is a two-dimensional array of image points, each of which represents the light intensity of a small area of a physical picture. For black/white images, each image point is a single bit of information with a value of either or 1 to indicate respectively, that the corresponding area of the picture is light or dark. These images are normally generated by scanning pictorial data, such as 8% inch X ll inch documents. Thereafter, the scanned pictorial data can be stored, viewed from a display, transmitted, or printed.

A variety of data compression techniques have been devised for reducing the storage requirements for digital images, and for reducing the bandwidth required for their transmission. Most of these techniques are based on some form of run length coding.

In its simplest form, run length coding of images involves two steps. First, there is the partitioning of each row of the image array into a sequence of runs, with each run comprising one or more adjacent image points with the same binary value, i.e., 0 or 1. Second, it is necessary to replace each run of image points with a single integer that specifies the length of the run. For example, a run of 10 successive image points with the value of O can be replaced by the single integer it). It is not necessary to identify explicitly the binary value of each run. It is sufficient to specify the binary value of the first run in each row, since the binary values of successive runs alternate between 0 and I.

More efficient run length coding techniques use variable length binary code words, rather than integers, to represent the lengths of the various runs. The run length codes used with such techniques are designed so that the shorter code words are used to represent more frequently occurring runs and the longer code words are used for less frequently occurring runs. For typical applications the runs of lengths l to 5 occur most frequently. The probability of occurrence for successively longer runs tends to decrease steadily thereafter. There is one single exception, that is the longest possible run. Such a run can, for example, represent a completely white line on the printed page, which occurs frequently. Since the probability of occurrence tends to decrease with the length of a run, the length of the code word used to represent a run generally increases with the length of the run. For example, a run of length is normally represented by a code word that is longer than the code word used for a run of length 10.

A slightly different group of run length coding techniques have been used when the number of image points with a binary value of() far exceeds the numer of image points with a binary value of 1. These techniques partition each row of the image array into a number of runs of 0s, each separated by a single I. Then, only the runs of Us are encoded. Although it is sometimes necessary to encode the run of no Os that appears between two adjacent ls in a row of the image array, it is not necessary to encode any runs of Is. This strategy is particularly effective when used in conjunction with predictive encoding, which transforms an original 2 image array into a new array that includes few 1 5. See, for example, L. Bahl et al., US Pat. No. 3,769,453, Finite Memory Adaptive Predictor.

Finally, a few sophisticated data compression techniques for images use run lengths codes that have been extended to include a number of special" code words in order to represent certain special situations. These special code words are used in conjunction with the regular code words used to represent runs. An example is the code described by I. Gorog et al. in the article entitled An Experimental Low Cost Graphic Information Distribution Terminal," 1971 SID International Symposium of Technical Papers. Gorogs code includes three special code words for special situations. These special situations are the occasion that a run in one row of an image array either ends directly beneath the end point of a corresponding run in the previous row, or ends one position to the left or right of this end point.

The primary disadvantage of previous run length coding systems is that they have used an ordinary or extended run length code which represented a compromise among three coding objectives. The objectives are high efficiency for typical images, uniformly high efficiency for a class of images, and an economical implementation. In this regard, reference should be made to N. Abramson, Information Theory and Coding," McGraw Hill Book Co., New York, 1963 at pp. -88 for a discussion for code efficiency. Abramsons efficiency measure is based upon the value of a symbol from an information source S, which can be measured in terms of an equivalent number of binary digits needed to represent one symbol from that source. The average value of a symbol from S is denoted by H(S). Note that H(S)=,-p log( l /p,), where p, is the probability of the ith source symbol. Given that L is the average code word length for any uniquely decodable code for the source, it is the case that L cannot be less than H(S). Accordingly, the efficiency of the code is the ratio of H(S)/L.

Taking the above coding objectives into account, the most easily implemented run length code, which uses the fixed length binary integer i as a code word for runs of length i, is not nearly as efficient as a variable length code. On the other hand, the most efficient extended run length code possible for a sample of images is the Huffman code based on the relative frequencies of runs and special situations in the sample of images. However, since run length codes for images typically re quire l to 5,000 code words, the Huffman code is normally difficult to implement.

Three general types of decoders are currently in use. These are the tree follower types; a table lookup type; and an encoder based type.

A tree follower decoder depends on the fact that standard variable length binary codes have a tree-like structure. The decoder includes logic circuitry corresponding to the tree, and successive code word bits cause control circuitry to traverse this tree structure. When a terminal node of the tree is reached, an entire code word has been received, and the terminal node identifies the code word.

A table lookup decoder includes a table containing each code word as a separate entity. As successive code word bits are received, each code word must be checked to see whether it agrees with all code word bits received so far. When only one code word agrees, that code word has been received and identified. The table storage required by this table lookup type of decoder is 3 expensive.

An encoder based decoder includes a copy of the encoder. 21 bit generator. and comparison circuitry. The bit generator supplies a sequence of bits to the encoder. The encoder continuously produces the code word appropriate for the run comprising the bits generated so far. Each code word thus produced is compared with code word bits received. When a match occurs, the decoded run length is taken to be the number of bits generated by the bit generator.

SUMMARY OF THE INVENTION It is therefore an object of this invention to provide a method and apparatus for decoding ordinary and extended run length codes that are both highly efficient for a sample of images and uniformly efficient for a class of images.

The above and other objects are believed satisfied by the description of a preferred embodiment of the invention, the apparatus comprising table storage and select/combine circuitry. The table storage is sufficient to hold four small tables whose values can be adjusted to correspond to any ordinary or extended run length code. The select/combine circuitry accepts as input the successive bits of a code word and compares the bits received so far with successive elements of one stored table until it is determined that an entire code word has been received. Then, this code word is combined with other table values to produce a binary value which uniquely identifies the code word received.

More particularly, the disclosed apparatus comprises a decoder that can isolate and identify a code word for a specific class of ordinary and extended run length codes. As will be shown, this class of codes includes a code of uniformly high efficiency for any desired data compression technique and any desired class of images.

The class of codes to be implemented includes the ordinary and extended run length codes characterized by three parameters, a maximum number N of regular code words, a maximum number M of special code words, and a maximum code word length L The class of codes to be implemented is further restricted by the requirements that the code word lengths L,.(l), L,.(2), for the regular code words c,.(l), c,(2), must be monotonically increasing. The code word lengths L,( l L,(2), for the various special code words c,( I c,(2), must also be monotonically increasing. That is, the code word lengths must satisfy the cording to the formulas MI: number of regular code words with length I: or

less. Mitt-number of lpeclal code words with length k or less;

h-(k) bag- M I I 2""M" 1 zit-L In particular, the regular and special code words for codes to be implemented must be the binary integers related to these table values according to the formulas These relationships between the monotonically increasing code word lengths, the table values, and the code words themselves are illustrated in Tables I and 2 for a code which includes seven regular code words and two special code words.

TABLE VALUES FOR SAMPLE CODES k r( llkl Mk) Mk) I O 0 O 0 2 I l l 2 3 2 2 5 6 4 5 2 l5 l5 5 7 2 32 32 A simple example will serve to illustrate that the class of ordinary and extended run length codes implemented by the disclosed decoder includes a code with uniformly high efficiency for any desired data compression technique and any desired class of images.

Suppose that the desired compression technique re quires an extended run length code with code words for runs of lengths 1 through n, and with code words for m special situation. Then, the relative frequencies of occurrence for the various runs and special situations may be measured in a sample of images, and these relative frequencies may be used to separate the runs and special situations into two ordered lists of events. In particular, the successive regular events are the runs with lengths f through n-l where f is the length of the most frequently occurring run. The successive special events are the m special situations plus the runs with lengths 1 through f-l and the runs of length n, all taken in order of decreasing frequency of occurrence. The relative frequencies of the n-f regular events and the m-H' special events are used to calculate code word lengths that are ordered and bounded according to the two relations and that lead to the minimum average code word length permitted by these relations. Finally, the code word lengths are used in the previously provided formulas to calculate values for the BR, BS, TR, and TS tables, and hence to calculate the n-f regular code words c,-(l), 0,.(2), code words c,(l), c,(2), c,(m+j).

The simple code construction technique just described constructs an extended run length code whose code word lengths are both monotonically increasing and bounded. As will be shown, this code is normally both highly efficient for the sample images and uniformly efficient for similar images not in the sample.

, c,(n-f) and the m+f special For typical data compression techniques the relative frequencies for runs reaches its maximum value for runs of length f, where f is less than 5, and then the relative frequencies for successively longer runs tend to decrease for runs with lengths between f and n-l. Therefore, the above code construction technique places the regular events approximately in order of decreasing relative frequency, and it places the special events exactly in order of decreasing relative frequency, so that monotonically increasing codeword lengths lead to a code that is highly efficient for the sample images. Furthermore, E. N. Gilbert shows in his article Codes Based on Inaccurate Source Probabilities, IEEE Transactions on Information Theory, Vol. IT-l 7, pp. 3043 14, May I971, that using a bound L 2 log (n +m for the length of the longest codewords tends to promote uniformly high code efficiency.

The operation of the disclosed decoder, which uses the stored tables BR, BS, and TS to isolate and identify codewords can be summarized as follows. As successive bits Y Y of a codeword are received, the select/combine circuitry concatenates these bits into a single integer for comparison with successive elements of the TS table. When for some value of k the resulting k-bit integer y y y y is found to satisfy the relation y t.(k), it is known that y is a k-bit codeword which must be identified. To this end, the codeword y is compared with the table value t,(k). If y t,.(k), then the dgooder calculates the event designation =J )-l r(k)+l, which signifies that y is the ith regular codeword c,(i). Alternatively, if y a t,( k), then the decoder calculates the event designation Fl, i=y r.( k) +b.( )+l. which signifies that y is the ith special codeword c,(i).

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a generalized block diagram of a data compression encoding apparatus in accordance with the present invention;

FIG. 2 is a generalized block diagram of a data compression decoder in accordance with the present invention;

FIG. 3 is a detailed block diagram of one embodiment of encoder select/combine apparatus 14 of FIG.

FIG. 4 is a more detailed block diagram of an embodiment of encoder tables 16 of FIG. 1;

FIG. 5 is a detailed diagram of a T router 160 of FIG.

FIG. 6 is a detailed diagram of a B router 161 of FIG.

FIG. 7 is a detailed block diagram of shift registers 182 of FIG. 4;

FIG. 8 is a detailed block diagram of an embodiment of shift out circuitry 17 of FIG. 1;

FIG. 9 is a detailed block diagram of an embodiment of the decoder select/combine apparatus 51 of FIG. 2;

FIG. 10 is a more detailed block diagram of an embodiment of the decoder S tables 52 of FIG. 2;

FIG. 11 is a detailed diagram of the Ts router 314 of FIG. 10;

FIG. 12 is a detailed diagram of the BS router 315 of FIG. 10;

FIG. 13 is a detailed block diagram of the S shift registers 323 of FIG. 10;

FIG. 14 is a more detailed block diagram of an embodiment of decoder R tables 53 of FIG. 2;

FIG. 15 is a detailed diagram of the TR router 364 of FIG. 14;

FIG. 16 is a detailed diagram of the BR router 365 of FFIG. 14;

FIG. 17 is a detailed block diagram of the R shift registers 373 of FIG. 14;

FIG. 18 is an illustration of the outputs of the encoder tables 16 of FIG. 1 and of the decoder tables 52 and 53 of FIG. 2;

FIG. 19 is a partial block diagram of another embodiment of an encoder select/combine apparatus 14 of FIG. 1;

FIG. 20 is a partial block diagram of another embodiment of a decoder select/combine apparatus 51 of FIG.

FIG. 21 is a detailed block diagram of an embodiment of an event recognizer 10 of FIG. 1;

FIG. 22 is a detailed block diagram of an embodiment of an event regenerator 50 of FIG. 2; and

FIG. 23 is a detailed block diagram of scan line buffers 432 in FIG. 21 and scan line buffers 540 of FIG. 22.

DESCRIPTION OF PREFERRED EMBODIMENTS The data compression encoding and decoding circuitry of the present invention may be implemented in various preferred forms and arrangements. One such embodiment is illustrated by the encoding arrangement of FIG. 1 and the decoding arrangement of FIG. 2.

FIG. 1 includes an event recognizer 10. The present invention relates to the compression of information, wherein elements of information may be characterized as events. An information element, or event, may comprise a binary encoded representation of an alphanumeric character, an analog voltage, a run of binary video information, a run of binary image information, or any other type of information capable of recognition. The events may be further characterized as regular or special.

The event recognizer 10 is designed to recognize each element of the specific type of information which is presented to it and to supply a binary output therefrom characterizing each received unit of information. The event characterization comprises a single bit of information to indicate whether the event is regular or special, and an event designation number that uniquely identifies the event.

Although the present invention can be employed in a wide variety of information environments, including those in which no events are classified as special, it is currently anticipated that the most advantageous usage of the present invention will be in the field of run length encoding. In such a circumstance, the event recognizer 10 may recognize each sequence of consecutive bits of the same logic level, which is denoted as a run. Runs may also be sequences of bits of one of the logic levels, or sequences of a level terminating in another level. All runs can be classified as regular events, and the event designation number for a run can simply be the length of that run. Alternatively, runs of certain lengths can be classified as special events. For example, runs of length l or 2 can be classified as special events and characterized by event designation numbers I or 2, while the remaining runs can be classified as regular events, with the event designation number for a run of length 3 or greater simply two less than the length of that run. Thus, a run of length 4 would be characterized by event 

1. An apparatus (FIGS. 2 and 9) for decoding variable length code words y into symbols s and i indicative of extended run length code words where s designates either that the number i represents one of n-f code words for ordinary runs of lengths f through n-1, or that the number i represents one of m+f code words for special evants, taken as any of m special situations, runs of lengths l through f-1, or runs of lengths n, where f is the most frequently occurring run length, the variable length code words being constrained such that the maximum code word length is bounded by a quantity Lmax > OR = log2(m+n) and all code word lengths are ordered according to the formulas 1 < OR = Lr(1) < OR = Lr(2) < OR = . . . < OR = Lr(n-f) < OR = Lmax 1 < OR = LS(1) < OR = Ls(2) < OR = . . . < OR = Ls(m+f) < OR = Lmax the apparatus comprising: a first stored table TS, a second stored table TR, a third stored table BR, and a fourth stored table TR (52, 53); means for accumulating successive codeword bits Y1, Y2, . . . , Yk to form a single integer Y Y1Y2 . . . Yk; means (51) for comparing Y with successive elements of the TS table such ath if Y > OR = ts(k), then another bit of y is accumulated; if y<ts(k), then the codeword y is decoded; means (51) for comparing y with successive elements of the TR table and, if y<tr(k), for computing the values s and i from y and the table values tr(k) and br(k), such that s 0 and i ytr(k)+br(k)+1, thus signifying that y is the ith regular code word cr(i); and means (51) for comparing y with successive elements of the TR table and, if y > OR = tr(k), for computing the values s and i from y and the table values ts(k) and bs(k), such that s 1 and i y-ts(k)+bs(k)+1, thus signifying that y is the ith special code word cs(i).
 2. An apparatus according to claim 1, wherein br(k) number of regular code words with length k or less; bs(k) number of special code words with length k or less; 